Tutorial

Devstral: An Open-Source Agentic LLM for Software Engineering

Published on May 30, 2025

AI/ML

Devstral: An Open-Source Agentic LLM for Software Engineering

Introduction

Well, it looks like 2025 may just really be the year of the agents. While there has been a lot of speculation about agent development and definitions and implementations vary considerably, we’re seeing meaningful progress in how these agents are materializing – particularly software engineering agents. In this article, we’re going to be taking a look at Devstral, an open-source agentic LLM developed through a collaboration between Mistral AI and All Hands AI.

Devstral Overview

Devstral is designed for agentic coding – that is, for solving multi-step tasks within large codebases. It’s capable of being run on a single Nvidia RTX 4090 GPU due to its lightweight design, with just 24 billion parameters, facilitating local deployment, on-device use, and privacy-sensitive applications. As an open-source model released under the Apache 2.0 license, Devstral is freely available for commercial use, modification, and integration into proprietary products. Furthermore, the model has a 128k context window, enabling it to process substantial amounts of code and instructions at a time, which is particularly beneficial for large codebases and complex problems. Finally, Devstral utilizes an advanced Tekken tokenizer with a 131k vocabulary size, enhancing its precision and efficiency in handling code and text inputs for accurate, context-aware responses tailored to software engineering.

Before we describe Devstral’s performance, let’s lay the groundwork by discussing SWE-Bench, the current standard for evaluating LLMs on practical coding challenges.

SWE-Bench Primer

SWE-Bench is an evaluation framework designed to assess the abilities of LLMs to perform software engineering tasks; the benchmark consists of 2,294 software engineering problems from real GitHub issues and correspond to pull requests from 12 popular Python repositories.
To enhance the reliability of evaluations, OpenAI introduced SWE-Bench Verified, a curated subset of 500 tasks from the original benchmark. These tasks were reviewed by professional software developers and further categorized by difficulty, with 196 tasks deemed “easy” (requiring less than 15 minutes to fix) and 45 labeled “hard” (taking over an hour). A task is considered successfully resolved when the model’s code modifications pass the associated unit tests and performance is quantified by the percentage of tasks a model successfully resolves.

Agentic Performance

Below we can see that Devstral (at the time of writing) is the top-performing open-source model on SWE-Bench Verified. swe-bench devstral

The diagram from the release post indicates that Devstral has both better agentic/swe-bench verified performance and lower parameter count than other open-source models like Gemma-3 27B, Qwen 23B-A22B, DeepSeek-V3-0324, DeepSeek-R1, and DeepSeek-V3. Devstral’s small size makes it favourable for inference-intensive agentic use cases.

Implementation

While there are a multitude of ways (HuggingFace, Ollama, Kaggle, Unsloth, LM Studio) to run Devstral, we will be implementing it with the OpenHands scaffold (a pre-built project template that helps developers get started quickly). This can be done with vLLM.

Step 1 : Set up a GPU Droplet

Begin by setting up a DigitalOcean GPU Droplet, select AI/ML and choose the NVIDIA H100 option.
AI-ML

Step 2: Web Console

Once your GPU Droplet finishes loading, you’ll be able to open up the Web Console.web console

pip3 install vllm --upgrade

Step 3: Install Dependencies

In the web console, copy and paste the following code snippet:

apt install python3-pip python3.10

Step 4: Spin up a vLLM Server

vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2

docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
docker run -it --rm --pull=always \
    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands-state:/.openhands-state \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
    docker.all-hands.dev/all-hands-ai/openhands:0.38

You will see a link in the web console, this will be copied for future use. launch

Step 6: Open VS Code

In VS Code, click on “Connect to…” in the Start menu. connect

Choose “Connect to Host…”. host

Step 7: Connect to your GPU Droplet

Click “Add New SSH Host…” and enter the SSH command to connect to your droplet. This command is usually in the format ssh root@[your_droplet_ip_address]. Press Enter to confirm, and a new VSCode window will open, connected to your droplet. You can find your droplet’s IP address on the GPU droplet page. droplet details

Step 8: Access OpenHands

In the new VSCode window connected to your droplet, type >sim and select “Simple Browser: Show”.

Paste the OpenHands url from the Web Console. browser

Once OpenHands is launched, there will be a multitude of models to select from, we chose devstral-small-2505. Note that you’ll need an API key. launch

From here, you’ll be able to connect to a repository or launch from scratch. scaffold

Conclusion

While LLM reasoning excels at code completion and isolated functions, real-world software engineering demands an understanding of code within broader systems, the ability to discern relationships between components, and the precision to identify subtle errors within intricate functions - capabilities that Devstral is designed to address. We hope you get the chance to try Devstral for youself.

How does Devstral perform for your software engineering needs? Comment below!

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products